Assignment Overview
You do not need to replicate ALL of the analyses presented in the paper, but at minimum you must replicate at least 3 analyses, including at least one descriptive statistical analysis and one inferential statistical analysis. As part of this assignment, you must also replicate to the best of your abilities at least one figure.
“Gregariousness, foraging effort, and affiliative interactions in lactating bonobos and chimpanzees” 2021 - Sean M. Lee, Gottfried Hohmann, Elizabeth V. Lonsdorf, Barbara Fruth,& Carson M. Murraya
This study aimed to investigate the cost of fission-fusion social dynamics on lactating bonobos and chimpanzees given their well-established differences in gregariousness and feeding competition. Lactation is energetically expensive and often requires females to alter their energy expenditure and intake in order to maintain the necessary positive energy balance. The authors predicted that lactating chimpanzees offset the cost of lactation by being less gregarious and spending more time alone, thus mitigating potential costs of intense feeding competition. They also predicted that less gregariousness in lactating chimpanzees would constrain their social budget, resulting in less affiliative interactions overall. The study specifically looked at a community of chimpanzees from Gombe, Tanzania and the Bompusa West bonobo community at LuiKotale, Democratic Republic of the Congo. They collected party scans and detailed behavioral data during focal follows, standardizing data collection across both research sites for easier analysis.
In brief the three predictions tested in this study were:
1. Lactating chimpanzees spend more time alone with their immature offspring than do lactating bonobos
2. Lactating females of the two species do not differ in feeding or travel time
3. Lactating bonobos spend more time engage in social interactions, particularly with individuals other than their immature offspring.
Data on lactating females was pooled into three age bins based on the age of their youngest dependent offspring (0 < 1.5, 1.5 < 3, and 3 < 4.5 years). These categories were included in months within their raw data. To test their predictions, generalized linear mixed models with beta-binomial error structure were fit using the package {glmmTMB} and nonparametric dispersion tests were run with testDispersion function from {DHARMa}. Parameter interactions/significance were tested using the Anova function from {car} and fit models were visually assessed with plotted residuals and QQ plots.
GLMM analysis revealed that in support of the first two predictions, lactating chimpanzees were less gregarious (spent more time alone) than lactating female bonobos. However, they also found lactating female chimpanzees and bonobos did not differ in their social interaction time, and that lactating chimpanzees actually spent proportionally more time affiliating with others. These results suggest that lactating chimpanzees do mitigate feeding competition by being less gregarious, but are still able to maintain their foraging budgets compared to bonobos. I found these results both interesting and surprising, especially because the authors did not provide any clear explanation for why bonobos are more gregarious yet, in this study, do not engage in as much social interactions when compared to lactating chimpanzees. The results of this study complicate the differences between chimpanzees and bonobos, which are often oversimplified in the popular literature. This topic requires closer investigation to better understand the nuances of fission-fusion societies and the formation/maintenance of social relationships.
I first selected an article titled, “Attractiveness of female sexual signaling predicts differences in female grouping patterns between bonobos and chimpanzees” by Surbeck et al. (2021) to replicate for this assignment. I found the study and their results extremely interesting, especially since I recently heard Surbeck present his research at NEEP and have long heard the idea that bonobos are more gregarious than chimps because of ecological differences. However, as I read through the paper and began to try and manipulate the dataset they shared on Dryad, I realized I was missing a lot of the details necessary to run the models they did in their analyses. The results and methodology sections at the end of the paper fail to include the parameters of their GLMMs, making it nearly impossible for me to try and understand and replicate - especially because this kind of modeling is brand new to me.
I then decided to look back over the other articles I had considered earlier for the assignment. This article (Lee et al. 2021) covered a somewhat similar topic while including much more detailed descriptions of their analyses! Though they explained their parameters/analyses fairly well, I still ran into some fun challenges because the authors fit GLMM using {glmmTMB}, which there is frustratingly little information on! Even after all the time I spent learning about glmms and reading about glmmTMB, I still don’t quite understand why they used glmmTMB and not glmer or a more popular glmm function/package. It seems that glmmTMB is preferred when working with zero-inflated models but this analysis did not use such models? Regardless, once I figured out how to run the first GLMM using glmmTMB, the rest of the analyses were fairly easy. I ended up replicating the two main tables from the article (Tables 2, and Tables 3) after fitting/running multiple glmmTMB models, as well as all 5 figures from the raw data. I have inserted images of the figures/tables from Lee et al. (2021) following my own replication of each in order to compare.
My first step of my replication analysis involved loading in the raw data published to Dryad by the article authors. The Dryad data was saved as an Excel file with 2 sheets: one with individual chimp/bonobo data on time spent ‘feeding, traveling, social, social interaction adjusted’ and the other with individual time spent ‘alone.’ I exported each individual Excel sheet into separate .csv files before uploading them to GitHub in order to curl the data into my R workspace.
#loading in 'alone' data
a <- curl("https://raw.githubusercontent.com/vzdanowicz/AN588_DataReplication_vrz/main/lee_alone.csv")
time_alone <- read.csv(a, header = TRUE, sep = ",", stringsAsFactors = FALSE)
names(time_alone)<-c('species','ID','age','total','alone') #updating frame column names for sanity
#loading in 'feed/travel/social' data
b <- curl("https://raw.githubusercontent.com/vzdanowicz/AN588_DataReplication_vrz/main/lee_behav.csv")
feed_travel_social <- read.csv(b, header = TRUE, sep = ",", stringsAsFactors = FALSE)
names(feed_travel_social)<-c('species','ID','age','total','feed','travel','social','social_adj')
#view
head(time_alone)
## species ID age total alone
## 1 bonobo Gwe 0--18 68 0
## 2 bonobo Iri 18--36 88 0
## 3 bonobo Iri 36--54 40 1
## 4 bonobo Nin 18--36 110 0
## 5 bonobo Olg 0--18 70 0
## 6 bonobo Olg 36--54 84 1
head(feed_travel_social)
## species ID age total feed travel social social_adj
## 1 bonobo Dju 0--18 1618 662 316 272 137
## 2 bonobo Gwe 0--18 3493 1415 386 720 334
## 3 bonobo Iri 0--18 2794 993 471 579 307
## 4 bonobo Nin 0--18 2621 1089 370 367 52
## 5 bonobo Olg 0--18 1425 383 202 95 5
## 6 bonobo Uma 0--18 2122 707 400 367 155
The article contains a fairly detailed description of their statistical analyses in R. I used the help() function to learn more about the {glmmTMB} package and functions used by these authors. The glmmTMB function fits generalized linear mixed models following lme4 syntax and using Template Model Builder (TMB). Below are some of the notes I kept track of while researching and trying to wrap my head around the TMB package since its a different version than glmer GLMMs.
formula,data = NULL,family = gaussian(),ziformula = ~0,dispformula = ~1,weights = NULL,offset = NULL,contrasts = NULL,na.action, se = TRUE,verbose = FALSE,doFit = TRUE,control = glmmTMBControl(),REML = FALSE,start = NULL,map = NULL,sparseX = NULL
Exploring & visualizing the datasets..
par(mfrow = c(1,2))
boxplot(data = time_alone, alone ~ age * species, group = time_alone$age)
boxplot(data = feed_travel_social, feed ~ age * species, col = c('cadetblue1','darkseagreen1'))
I was trying to get some visualizations of the entire dataset but was having a hard time without manipulating any of the raw data. The authors used summarized values like mean/se in their figures so I will go through and replicate that later on.
I was not sure if I needed to convert some of the frame values from characters into factors so I did it anyway to be safe! I did this by using the as.factor function and specifying the variables to coerce.
#convert column 'id' from character to factor
time_alone$ID <- as.factor(time_alone$ID)
time_alone$species <- as.factor(time_alone$species)
time_alone$age <- as.factor(time_alone$age)
str(time_alone)
## 'data.frame': 30 obs. of 5 variables:
## $ species: Factor w/ 2 levels "bonobo","chimpanzee": 1 1 1 1 1 1 1 1 1 1 ...
## $ ID : Factor w/ 19 levels "BAH","DL","EZA",..: 9 10 10 11 12 12 13 13 14 15 ...
## $ age : Factor w/ 3 levels "0--18","18--36",..: 1 2 3 2 1 3 1 2 1 3 ...
## $ total : int 68 88 40 110 70 84 48 27 30 119 ...
## $ alone : int 0 0 1 0 0 1 2 2 0 0 ...
str shows that I successfully manipulated the dataframe and all the variables are now either factors or integers. The factor levels come into play when grouping data for replicating the article figures.
Below, I break down the replication analysis by the three main predictions as outlined above. For each prediction, I fit a generalized linear mixed model using glmmTMB (with a beta-binomial error structure) to test the interaction of species and infant age before fitting an independent fixed-effects model. The authors included in the article that if the interaction of species/infant age was significant that they conducted Tukey’s pairwise post hoc comparison yet the step was not required for any of the tested predictions. In addition to replicating the models and figures, I also replicated their dispersion analysis with the testDispersion function from {DHARMa}and was able to match my dispersion values with theirs.
Prediction: Lactating female chimpanzees spend more time alone and are less gregarious than lactating bonobos
Lee et al. (2021) report that the response variable for this first model was calculated by dividing the number of scans of a given female by the total number of party scans collected on that female during that age class. Thankfully, the data was already organized by each species, female ID, and infant age class, so I only had to manipulate my dataframe time_alone to calculate the necessary response variable.
To do this - I first tried creating a new column in time_alone and then wrote a for loop that filled the appropriate proportion values into this new column using the code below.
time_alone$prop_alone <- NULL
for (i in 1:nrow(time_alone)) {
time_alone$prop_alone[i] <- time_alone$alone_scans[i]/time_alone$total_scans[i]
}
After running through that code chunk originally, I got my response variable and began trying to work through the GLMM. I started this process by carefully reading through the article to understand the parameters used, as well as read the {glmmTMB} package information and ecological GLMM examples from our course supplementary readings. I also had to Google what exactly a beta-binomial model meant (as described in the source article).
First I tested the interaction between species and infant age class (as outlined in the article). I first tried to create a full model that included the interaction and then a reduced model with both as independent fixed-effects and test the relationship with Anova.. like we had done in Mixed Effects module. But I kept getting the error message Error in fixef(mod)[[component]] : invalid subscript type ‘list’
I then realized Anova() function from {car} can actually directly test interactions within a single model. So I paired down to just have P1M1 model and then tested the significance of the interaction using Anova() Wald “Chisq” with type “III”. I also repeatedly got the wrong output to my model with the code above, it was printing all the variables and looked nothing like the article outputs. After lots more reading and messing around, I realized I could make the response variable the actual proportion of alone scans and total scans as long as I included weight = total (so weight with the denominator of the proportion of the response variable).
I could not figure out why my results did not look like those in the article… the values of the model above were close but didnt seem to be giving thee right output. I continued to read about the glmmTMB function and package to try and understand what piece I was missing. I eventually figured out that in order to have a proportion (non binary/Bernoulli) response variable in the model, it needs to be weighted. The {glmmTMB} package says: “Binomial models with more than one trial (i.e., not binary/Bernoulli) can either be specified in the form prob ~…,weights = N, or in the more typical two-column matrix cbind(successes,failures)~… form” (Magnusson et al. 2021). I found an example of someone doing this in a question thread online. They kept the response variable as the proportion without creating an additional column in their data frame and used the denominator as the weight. After attempting that version of the model, my output was finally identical to Lee et al. (2021)!! Unfortunately I didn’t keep the trial/error code because it complicated my knitting/object names but it was quite the process.
Model testing the interaction of
speciesand infantage
Below is the first model for prediction 1, testing the interaction effect of species and age to create the glmmTMB object, P1M1, and testing the significance with Anova. In all models I created, lactating female identity, ID, was added as random effect.
P1M1 <- glmmTMB(alone/total ~ species * age + (1 | ID), data = time_alone, family = betabinomial(link = "logit"), weight = total)
P1M1_anova <- Anova(P1M1, type = c("III"), test.statistic = "Chisq")
Before moving on, I made sure to compare the summary statistics of this model with the results from the article to ensure I included the same parameters and got similar enough numbers. I actually was able to exactly replicate their numbers for the models and model testing! After creating my replications of Table 3 and Table 4, we can more easily look at/compare my results with those of Lee et al. (2021).
summary(P1M1)
## Family: betabinomial ( logit )
## Formula: alone/total ~ species * age + (1 | ID)
## Data: time_alone
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 151.8 163.0 -67.9 135.8 22
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 1.009e-08 0.0001005
## Number of obs: 30, groups: ID, 19
##
## Dispersion parameter for betabinomial family (): 13.6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.9000 0.7189 -5.425 5.8e-08 ***
## specieschimpanzee 2.5870 0.7646 3.383 0.000716 ***
## age18--36 -0.8451 1.2221 -0.692 0.489236
## age36--54 0.5001 0.9182 0.545 0.585949
## specieschimpanzee:age18--36 0.6804 1.3018 0.523 0.601210
## specieschimpanzee:age36--54 -0.8054 1.0334 -0.779 0.435795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P1M1_anova)
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: alone/total
## Chisq Df Pr(>Chisq)
## (Intercept) 29.4281 1 5.803e-08 ***
## species 11.4475 1 0.0007159 ***
## age 1.3684 2 0.5044876
## species:age 1.5098 2 0.4700581
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall, this model (P1M1) and model testing revealed that the interaction between species and infant age was not significant - under the ’Response: alone/total" part of the anova output we can see that “species:age” p-val is 0.470. Thus, I next refitted the model to include species and infant age as independent fixed-effects. When I first worked through this replication, I pulled out the chisq x2, df, and p val into new objects after each Anova for making my figures (I did this using tidy function from {broom}.
Interaction was not significant so I then needed to model species and infant age as independent fixed-effects. I used about the same model function but changed species * age to species + age and used a type II Anova instead of the type III test needed for interaction affects.
P1M2 <- glmmTMB(alone/total ~ species + age + (1 | ID), data = time_alone, family = betabinomial(link = "logit"), weight = total)
P1M2_anova <- Anova(P1M2, type = c("II"), test.statistic = "Chisq")
summary(P1M2)
## Family: betabinomial ( logit )
## Formula: alone/total ~ species + age + (1 | ID)
## Data: time_alone
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 149.4 157.8 -68.7 137.4 24
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 4.269e-08 0.0002066
## Number of obs: 30, groups: ID, 19
##
## Dispersion parameter for betabinomial family (): 13.9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.8044 0.5085 -7.481 7.37e-14 ***
## specieschimpanzee 2.4699 0.4814 5.130 2.89e-07 ***
## age18--36 -0.2671 0.4194 -0.637 0.524
## age36--54 -0.1396 0.4048 -0.345 0.730
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P1M2_anova)
## Analysis of Deviance Table (Type II Wald chisquare tests)
##
## Response: alone/total
## Chisq Df Pr(>Chisq)
## species 26.3209 1 2.891e-07 ***
## age 0.4135 2 0.8132
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I again checked the results of the model summary and Anova below matched Lee et al.(2021). And they do! We can see from the Anova results above that species had a significant affect on the model with a very low p val. After confirming my models and model tests, I moved on to running dispersion tests for both the interaction model (P1M1) and independent model (P1M2).
Before running the dispersion tests I had to spend some time reading about the {DHARMa} package and what simulating residuals will look like to code and plot. The article included the deviance ratios and p-vals for the dispersion tests but did not include any of the visualizations or further explanations. The code below goes through the residual simulation using the P1M1 model.
P1M1_sim <- simulateResiduals(fittedModel = P1M1, plot = TRUE) #default is 250, authors may have increased because #s slightly dif
testDispersion(P1M1_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.87923, p-value = 0.96
## alternative hypothesis: two.sided
My values (deviance = 8.7923, P = 0.96) are slightly different than those from the article (deviance = 0.957, P = 0.960), but that’s to be expected when running a simulation.
I also used testDispersion for the independent model for prediction one, P1M2
P1M2_sim <- simulateResiduals(fittedModel = P1M2, plot = TRUE) #default is 250, authors may have increased because #s slightly dif
testDispersion(P1M2_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.97202, p-value = 0.896
## alternative hypothesis: two.sided
Also got the same P val for this dispersion test but different deviance ratio, again likely because of simulation differences (dispersion = 0.972, P = 0.896). The article reported the following dispersion test results for P1M2: deviance ratio = 1.002 and P = 0.928.
Prediction: Lactating females chimpanzees and bonobos do not differ in feeding/travel time
Lee et al. (2021) report that the response variable for the models to test Prediction 2 was calculated by dividing the number of point scans of a given female engaged in feeding or travel, during each infant age class, by the total number of scans collected on that female. The rest of the models and figures 2-5 all involve the second dataframe, feed_travel_social. Similarly, before getting started with the models for Prediction 2 I first changed character values to factor in the other data.
#convert column 'id' from character to factor
feed_travel_social$ID <- as.factor(feed_travel_social$ID)
feed_travel_social$species <- as.factor(feed_travel_social$species)
feed_travel_social$age <- as.factor(feed_travel_social$age)
str(feed_travel_social)
## 'data.frame': 34 obs. of 8 variables:
## $ species : Factor w/ 2 levels "bonobo","chimpanzee": 1 1 1 1 1 1 1 1 1 1 ...
## $ ID : Factor w/ 26 levels "BAH","Dju","DL",..: 2 11 13 15 17 24 25 26 19 22 ...
## $ age : Factor w/ 3 levels "0--18","18--36",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ total : int 1618 3493 2794 2621 1425 2122 2717 1314 2196 1969 ...
## $ feed : int 662 1415 993 1089 383 707 1198 604 969 834 ...
## $ travel : int 316 386 471 370 202 400 586 225 490 478 ...
## $ social : int 272 720 579 367 95 367 366 140 279 223 ...
## $ social_adj: int 137 334 307 52 5 155 122 57 93 55 ...
Like with the models for Prediction 1, I started off by testing the interaction between species and infant age. The function below is about the same at the one used in the P1M1 interaction model other than adjustments in the variables being called since I am now modeling the proportion of time spent feeding and the influence of species*age.
P2M1 <- glmmTMB(feed/total ~ species * age + (1 | ID), data = feed_travel_social, family = betabinomial(link = "logit"), weight = total)
P2M1_anova <- Anova(P2M1, type = c("III"), test.statistic = "Chisq")
summary(P2M1)
## Family: betabinomial ( logit )
## Formula: feed/total ~ species * age + (1 | ID)
## Data: feed_travel_social
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 455.3 467.5 -219.7 439.3 26
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.03392 0.1842
## Number of obs: 34, groups: ID, 26
##
## Dispersion parameter for betabinomial family (): 56.6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.46398 0.11563 -4.013 6e-05 ***
## specieschimpanzee -0.11293 0.16074 -0.703 0.4823
## age18--36 0.11751 0.18741 0.627 0.5306
## age36--54 0.36437 0.17785 2.049 0.0405 *
## specieschimpanzee:age18--36 0.53117 0.28383 1.871 0.0613 .
## specieschimpanzee:age36--54 -0.05361 0.24885 -0.215 0.8294
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P2M1_anova)
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: feed/total
## Chisq Df Pr(>Chisq)
## (Intercept) 16.1019 1 6.003e-05 ***
## species 0.4936 1 0.4823
## age 4.1999 2 0.1225
## species:age 4.3585 2 0.1131
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I again checked the results of the model summary and Anova matched Lee et al.(2021). And they do! We can see from the Anova results above that the interaction between species and infant age was not significant when looking at proportion of time spent feeding. The next step is to refit the model with species and age as independent fixed-effects.
Interaction was not significant so I next needed to model species and infant age as independent fixed-effects for the feeding data. I used about the same model function but changed species * age to species + age and used a type II Anova instead of the type III test needed for interaction affects. now looking at species and age as independent fixed effects
P2M2 <- glmmTMB(feed/total ~ species + age + (1 | ID), data = feed_travel_social, family = betabinomial(link = "logit"), weight = total)
P2M2_anova <- Anova(P2M2, type = c("II"), test.statistic = "Chisq")
summary(P2M2)
## Family: betabinomial ( logit )
## Formula: feed/total ~ species + age + (1 | ID)
## Data: feed_travel_social
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 455.4 464.5 -221.7 443.4 28
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.028 0.1673
## Number of obs: 34, groups: ID, 26
##
## Dispersion parameter for betabinomial family (): 45.1
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5071 0.1088 -4.662 3.12e-06 ***
## specieschimpanzee -0.0226 0.1256 -0.180 0.8572
## age18--36 0.3544 0.1512 2.344 0.0191 *
## age36--54 0.3189 0.1323 2.411 0.0159 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P2M2_anova)
## Analysis of Deviance Table (Type II Wald chisquare tests)
##
## Response: feed/total
## Chisq Df Pr(>Chisq)
## species 0.0324 1 0.85721
## age 8.3784 2 0.01516 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
age has significant effect on feeding model (supported by article)
Interact model first
P2M3 <- glmmTMB(travel/total ~ species * age + (1 | ID), data = feed_travel_social, family = betabinomial(link = "logit"), weight = total)
P2M3_anova <- Anova(P2M3, type = c("III"), test.statistic = "Chisq")
look at model/anova results and compare with paper
summary(P2M3)
## Family: betabinomial ( logit )
## Formula: travel/total ~ species * age + (1 | ID)
## Data: feed_travel_social
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 391.5 403.7 -187.8 375.5 26
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.02284 0.1511
## Number of obs: 34, groups: ID, 26
##
## Dispersion parameter for betabinomial family (): 303
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.59316 0.07804 -20.415 <2e-16 ***
## specieschimpanzee -0.12538 0.11473 -1.093 0.274
## age18--36 0.14736 0.16062 0.917 0.359
## age36--54 0.15403 0.12064 1.277 0.202
## specieschimpanzee:age18--36 0.20379 0.25869 0.788 0.431
## specieschimpanzee:age36--54 -0.06929 0.16406 -0.422 0.673
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P2M3_anova)
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: travel/total
## Chisq Df Pr(>Chisq)
## (Intercept) 416.7576 1 <2e-16 ***
## species 1.1943 1 0.2745
## age 2.6811 2 0.2617
## species:age 0.8496 2 0.6539
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
now looking at species and age as independent fixed effects
P2M4 <- glmmTMB(travel/total ~ species + age + (1 | ID), data = feed_travel_social, family = betabinomial(link = "logit"), weight = total)
P2M4_anova <- Anova(P2M4, type = c("II"), test.statistic = "Chisq")
look at model/anova results and compare with paper
summary(P2M4)
## Family: betabinomial ( logit )
## Formula: travel/total ~ species + age + (1 | ID)
## Data: feed_travel_social
## Weights: total
##
## AIC BIC logLik deviance df.resid
## 388.5 397.6 -188.2 376.5 28
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.01065 0.1032
## Number of obs: 34, groups: ID, 26
##
## Dispersion parameter for betabinomial family (): 203
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.60571 0.06880 -23.338 < 2e-16 ***
## specieschimpanzee -0.09255 0.08014 -1.155 0.24818
## age18--36 0.25062 0.09433 2.657 0.00789 **
## age36--54 0.10520 0.08471 1.242 0.21428
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(P2M4_anova)
## Analysis of Deviance Table (Type II Wald chisquare tests)
##
## Response: travel/total
## Chisq Df Pr(>Chisq)
## species 1.3335 1 0.24818
## age 7.1529 2 0.02798 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
age has significant effect on feeding model (supported by article)
Next need to look at nonparametric dispersion tests using the testDispersion() function from {DHARMa}… first need to simulateResiduals
P2M1_sim <- simulateResiduals(fittedModel = P2M1, plot = TRUE) #feeding interaction
testDispersion(P2M1_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.1285, p-value = 0.568
## alternative hypothesis: two.sided
P2M2_sim <- simulateResiduals(fittedModel = P2M2, plot = TRUE) #feeding independent
testDispersion(P2M2_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.3363, p-value = 0.136
## alternative hypothesis: two.sided
NEED TO PULL OUT PVAL AND DEVIANCE RATIO FOR INTERACTION/INDEPENDENT DISPERSION TESTS Not getting exactly same value for deviance ratio… could have to do with how data was simulated?
P2M3_sim <- simulateResiduals(fittedModel = P2M3, plot = TRUE) #travel interaction
testDispersion(P2M3_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.72744, p-value = 0.2
## alternative hypothesis: two.sided
P2M4_sim <- simulateResiduals(fittedModel = P2M4, plot = TRUE) #travel independent
testDispersion(P2M4_sim)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 0.80201, p-value = 0.36
## alternative hypothesis: two.sided
Also got the same P val for this dispersion test but different deviance ratio (again prob because of simulation differences)…
Pulling glmmTMB model results into tibble then dataframe to manipulate into a table. I unfortunately couldn’t just use {stargazer} or similiar packages because next-to-nothing is compatible with glmmTMB outputs… which is annoying!
#time alone interaction
P1M1_tidy <- tidy(P1M1)
P1M1_df <- as.data.frame(P1M1_tidy)
#time alone independent
P1M2_tidy <- tidy(P1M2)
P1M2_df <- as.data.frame(P1M2_tidy)
#time feeding interaction
P2M1_tidy <- tidy(P2M1)
P2M1_df <- as.data.frame(P2M1_tidy)
#time feeding independent
P2M2_tidy <- tidy(P2M2)
P2M2_df <- as.data.frame(P2M2_tidy)
#time traveling interaction
P2M3_tidy <- tidy(P2M3)
P2M3_df <- as.data.frame(P2M3_tidy)
#time traveling independent
P2M4_tidy <- tidy(P2M4)
P2M4_df <- as.data.frame(P2M4_tidy)
#time social interaction
P3M1_tidy <- tidy(P3M1)
P3M1_df <- as.data.frame(P3M1_tidy)
#time social independent
P3M2_tidy <- tidy(P3M2)
P3M2_df <- as.data.frame(P3M2_tidy)
#time social adjusted interaction
P3M3_tidy <- tidy(P3M3)
P3M3_df <- as.data.frame(P3M3_tidy)
#time social adjusted independent
P3M4_tidy <- tidy(P3M4)
P3M4_df <- as.data.frame(P3M4_tidy)
#Table 3 - GLMM parameter estimates for independent effects models
alone_independent <- P1M2_df[c(1:4),c(4:8)]
feed_independent <- P2M2_df[c(1:4),c(4:8)]
travel_independent <- P2M4_df[c(1:4),c(4:8)]
social_independent <- P3M2_df[c(1:4),c(4:8)]
social_adj_independent <- P3M4_df[c(1:4),c(4:8)]
#Table 4 - GLMM parameter estimates for interaction effect models
alone_interaction <- P1M1_df[c(1,5,6),c(4:8)]
feed_interaction <- P2M1_df[c(1,5,6),c(4:8)]
travel_interaction <- P2M3_df[c(1,5,6),c(4:8)]
social_interaction <- P3M1_df[c(1,5,6),c(4:8)]
social_adj_interaction <- P3M3_df[c(1,5,6),c(4:8)]
change column names in data frames
alone_independent$term <- c("Intercept", "Chimpanzee","Infant age class 1.5 < 3","Infant age class 3 < 4.5")
feed_independent$term <- c("Intercept", "Chimpanzee","Infant age class 1.5 < 3","Infant age class 3 < 4.5")
travel_independent$term <- c("Intercept", "Chimpanzee","Infant age class 1.5 < 3","Infant age class 3 < 4.5")
social_independent$term <- c("Intercept", "Chimpanzee","Infant age class 1.5 < 3","Infant age class 3 < 4.5")
social_adj_independent$term <- c("Intercept", "Chimpanzee","Infant age class 1.5 < 3","Infant age class 3 < 4.5")
Preparing independent table (table 3)
Independent_Table <- rbind.data.frame(alone_independent,feed_independent,travel_independent,social_independent,social_adj_independent)
Independent_Table <- Independent_Table %>%
mutate_if(is.numeric, round, digits = 3)
Table 3! spent way too long trying to figure out how to italicize z and P.. decided to just italicize the entire first row
#Table3
kbl(Independent_Table, col.names = c("Model", "Estimate", "Standard error", "z", "P"), caption = "Table 3 - GLMM parameter estimates for independent effects models") %>%
kable_paper(html_font = "Cambria") %>%
pack_rows(index = c("Time Alone" = 4, "Feeding" = 4, "Travel" = 4, "Social interactions" = 4, "Adjusted social interactions" = 4))%>%
add_indent(1:20, level_of_indent = 10)%>%
row_spec(0, italic = T)
| Model | Estimate | Standard error | z | P |
|---|---|---|---|---|
| Time Alone | ||||
| Intercept | -3.804 | 0.509 | -7.481 | 0.000 |
| Chimpanzee | 2.470 | 0.481 | 5.130 | 0.000 |
| Infant age class 1.5 < 3 | -0.267 | 0.419 | -0.637 | 0.524 |
| Infant age class 3 < 4.5 | -0.140 | 0.405 | -0.345 | 0.730 |
| Feeding | ||||
| Intercept | -0.507 | 0.109 | -4.662 | 0.000 |
| Chimpanzee | -0.023 | 0.126 | -0.180 | 0.857 |
| Infant age class 1.5 < 3 | 0.354 | 0.151 | 2.344 | 0.019 |
| Infant age class 3 < 4.5 | 0.319 | 0.132 | 2.411 | 0.016 |
| Travel | ||||
| Intercept | -1.606 | 0.069 | -23.338 | 0.000 |
| Chimpanzee | -0.093 | 0.080 | -1.155 | 0.248 |
| Infant age class 1.5 < 3 | 0.251 | 0.094 | 2.657 | 0.008 |
| Infant age class 3 < 4.5 | 0.105 | 0.085 | 1.242 | 0.214 |
| Social interactions | ||||
| Intercept | -1.755 | 0.115 | -15.224 | 0.000 |
| Chimpanzee | 0.067 | 0.130 | 0.516 | 0.606 |
| Infant age class 1.5 < 3 | -0.157 | 0.163 | -0.960 | 0.337 |
| Infant age class 3 < 4.5 | -0.249 | 0.162 | -1.534 | 0.125 |
| Adjusted social interactions | ||||
| Intercept | -3.101 | 0.210 | -14.802 | 0.000 |
| Chimpanzee | 0.782 | 0.217 | 3.605 | 0.000 |
| Infant age class 1.5 < 3 | -0.082 | 0.298 | -0.276 | 0.782 |
| Infant age class 3 < 4.5 | -0.031 | 0.229 | -0.135 | 0.892 |
Table 3 - screenshot captured from Lee et al. (2021)
alone_interaction$term <- c("Intercept","Chimpanzee × Age 1.5 < 3", "Chimpanzee × Age 3 < 4.5")
feed_interaction$term <- c("Intercept","Chimpanzee × Age 1.5 < 3", "Chimpanzee × Age 3 < 4.5")
travel_interaction$term <- c("Intercept","Chimpanzee × Age 1.5 < 3", "Chimpanzee × Age 3 < 4.5")
social_interaction$term <- c("Intercept","Chimpanzee × Age 1.5 < 3", "Chimpanzee × Age 3 < 4.5")
social_adj_interaction$term <- c("Intercept","Chimpanzee × Age 1.5 < 3", "Chimpanzee × Age 3 < 4.5")
interaction table prep
Interaction_Table <- rbind.data.frame(alone_interaction,feed_interaction,travel_interaction,social_interaction,social_adj_interaction)
Interaction_Table <- Interaction_Table %>%
mutate_if(is.numeric, round, digits = 3)
rownames(Interaction_Table) <- c(1:15)
spent way too long trying to figure out how to italicize z and P.. decided to just italicize the entire first row
#Table 4
kbl(Interaction_Table, col.names = c("Model", "Estimate", "Standard error", "z", "P"), caption = "Table 4 - GLMM parameter estimates for interaction effects models") %>%
kable_paper(html_font = "Cambria") %>%
pack_rows(index = c("Time Alone" = 3, "Feeding" = 3, "Travel" = 3, "Social interactions" = 3, "Adjusted social interactions" = 3))%>%
add_indent(1:15, level_of_indent = 10)%>%
row_spec(0, italic = T)
| Model | Estimate | Standard error | z | P |
|---|---|---|---|---|
| Time Alone | ||||
| Intercept | -3.900 | 0.719 | -5.425 | 0.000 |
| Chimpanzee × Age 1.5 < 3 | 0.680 | 1.302 | 0.523 | 0.601 |
| Chimpanzee × Age 3 < 4.5 | -0.805 | 1.033 | -0.779 | 0.436 |
| Feeding | ||||
| Intercept | -0.464 | 0.116 | -4.013 | 0.000 |
| Chimpanzee × Age 1.5 < 3 | 0.531 | 0.284 | 1.871 | 0.061 |
| Chimpanzee × Age 3 < 4.5 | -0.054 | 0.249 | -0.215 | 0.829 |
| Travel | ||||
| Intercept | -1.593 | 0.078 | -20.415 | 0.000 |
| Chimpanzee × Age 1.5 < 3 | 0.204 | 0.259 | 0.788 | 0.431 |
| Chimpanzee × Age 3 < 4.5 | -0.069 | 0.164 | -0.422 | 0.673 |
| Social interactions | ||||
| Intercept | -1.749 | 0.126 | -13.840 | 0.000 |
| Chimpanzee × Age 1.5 < 3 | -0.126 | 0.324 | -0.390 | 0.696 |
| Chimpanzee × Age 3 < 4.5 | 0.206 | 0.292 | 0.705 | 0.481 |
| Adjusted social interactions | ||||
| Intercept | -2.910 | 0.211 | -13.799 | 0.000 |
| Chimpanzee × Age 1.5 < 3 | 0.506 | 0.495 | 1.023 | 0.306 |
| Chimpanzee × Age 3 < 4.5 | 0.878 | 0.470 | 1.869 | 0.062 |
Table 4 - screenshot captured from Lee et al. (2021)
I first built a theme with the approximate fonts/sizes/format used in Lee et al. (2021), though I did add colors!
leetheme <- theme(plot.title = element_text(family = "Times", "bold", size = 12, hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(family = "Times", size = (10)),
axis.title = element_text(family = "Times", size = (12)),
axis.text = element_text(family = "Times", size = (12)))
I used the for loops below to add new columns with proportions of each behavior for figure-making
#prep for fig1
time_alone$pct <- NULL
for (i in 1:nrow(time_alone)) {
time_alone$pct[i] <- time_alone$alone[i]/time_alone$total[i]
}
#prep for fig2
feed_travel_social$feedpct <- NULL
for (i in 1:nrow(feed_travel_social)) {
feed_travel_social$feedpct[i] <- feed_travel_social$feed[i]/feed_travel_social$total[i]
}
#prep for fig3
feed_travel_social$travelpct <- NULL
for (i in 1:nrow(feed_travel_social)) {
feed_travel_social$travelpct[i] <- feed_travel_social$travel[i]/feed_travel_social$total[i]
}
#prep for fig4
feed_travel_social$socialpct <- NULL
for (i in 1:nrow(feed_travel_social)) {
feed_travel_social$socialpct[i] <- feed_travel_social$social[i]/feed_travel_social$total[i]
}
#prep for fig5
feed_travel_social$adjustpct <- NULL
for (i in 1:nrow(feed_travel_social)) {
feed_travel_social$adjustpct[i] <- feed_travel_social$social_adj[i]/feed_travel_social$social[i]
}
I used the summarise function to create new dataframes with the summarized values for the different behaviors. I grouped the frames by species and age to make it easier to plot.
#Data for figure 1
alone.data <- time_alone %>%
group_by(species, age) %>%
summarise(AvgPct = mean(pct), sd = sd(pct), n = n(), se = sd/sqrt(n))
#Data for figures 2-5
behav.data <- feed_travel_social%>%
group_by(species, age) %>%
summarise(AvgPct.Feed = mean(feedpct), feed.sd = sd(feedpct), feed.n = n(), feed.se = feed.sd/sqrt(feed.n), AvgPct.Travel = mean(travelpct), travel.sd = sd(travelpct), travel.n = n(), travel.se = travel.sd/sqrt(travel.n), AvgPct.Social = mean(socialpct), social.sd = sd(socialpct), social.n = n(), social.se = social.sd/sqrt(social.n), AvgPct.Adjust = mean(adjustpct), adjust.sd = sd(adjustpct), adjust.n = n(), adjust.se = adjust.sd/sqrt(adjust.n))
Figure 1 - Mean ± standard error percentage of time that lactating females spent ranging in parties with only their immature offspring.
fig1 <- ggplot(alone.data, aes(age, AvgPct)) + theme_classic() + leetheme + ggtitle("Time Alone")+
geom_bar(aes(fill = species), stat = "identity", position = position_dodge(0.8), width = 0.9)+
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,.5))+
scale_x_discrete(labels = c("0 < 1.5", "1.5 < 3", "3 < 4.5"))+
xlab("\nAge of youngest infant (years)") + ylab("Percent of focal observation time\n")+
geom_errorbar(aes(ymin = AvgPct-se, ymax = AvgPct+se, group = species),width = 0.1, position = position_dodge(0.8))+
scale_fill_manual(values = c("darkseagreen4", "darkseagreen3"))
fig1
Figure 2 - Mean ± standard error percentage of time that lactating females spent feeding.
fig2 <- ggplot(behav.data, aes(age, AvgPct.Feed)) + theme_classic() + leetheme + ggtitle("Feeding")+
geom_bar(aes(fill = species), stat = "identity", position = position_dodge(0.8), width = 0.9)+
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,1))+
scale_x_discrete(labels = c("0 < 1.5", "1.5 < 3", "3 < 4.5"))+
xlab("\nAge of youngest infant (years)") + ylab("Percent of focal observation time\n")+
geom_errorbar(aes(ymin = AvgPct.Feed-feed.se, ymax = AvgPct.Feed+feed.se, group = species),width = 0.1, position = position_dodge(0.8))+
scale_fill_manual(values = c("darkseagreen4", "darkseagreen3"))
fig2
Figure 3 - Mean ± standard error percentage of time that lactating females spent traveling.
fig3 <- ggplot(behav.data, aes(age, AvgPct.Travel)) + theme_classic() + leetheme + ggtitle("Travel")+
geom_bar(aes(fill = species), stat = "identity", position = position_dodge(0.8), width = 0.9)+
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,.5))+
scale_x_discrete(labels = c("0 < 1.5", "1.5 < 3", "3 < 4.5"))+
xlab("\nAge of youngest infant (years)") + ylab("Percent of focal observation time\n")+
geom_errorbar(aes(ymin = AvgPct.Travel-travel.se, ymax = AvgPct.Travel+travel.se, group = species), width = 0.1, position = position_dodge(0.8))+
scale_fill_manual(values = c("darkseagreen4", "darkseagreen3"))
fig3
Figure 4 - Mean ± standard error percentage of time that lactating females spent engaged in social interactions with any community member.
fig4 <- ggplot(behav.data, aes(age, AvgPct.Social)) + theme_classic() + leetheme + ggtitle("Social Interactions")+
geom_bar(aes(fill = species), stat = "identity", position = position_dodge(0.8), width = 0.9)+
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,.5))+
scale_x_discrete(labels = c("0 < 1.5", "1.5 < 3", "3 < 4.5"))+
xlab("\nAge of youngest infant (years)") + ylab("Percent of focal observation time\n")+
geom_errorbar(aes(ymin = AvgPct.Social-social.se, ymax = AvgPct.Social+social.se, group = species), width = 0.1, position = position_dodge(0.8))+
scale_fill_manual(values = c("darkseagreen4", "darkseagreen3"))
fig4
Figure 5 - Mean ± standard error percentage of social interactions in which lactating females spent engaged in social interactions with individuals other than their immature offspring.
fig5 <- ggplot(behav.data, aes(age, AvgPct.Adjust)) + theme_classic() + leetheme + ggtitle("Adjusted Social Interactions")+
geom_bar(aes(fill = species), stat = "identity", position = position_dodge(0.8), width = 0.9)+
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0,1))+
scale_x_discrete(labels = c("0 < 1.5", "1.5 < 3", "3 < 4.5"))+
xlab("\nAge of youngest infant (years)") + ylab("Percent of social interactions\n")+
geom_errorbar(aes(ymin = AvgPct.Adjust-adjust.se, ymax = AvgPct.Adjust+adjust.se, group = species),width = 0.1, position = position_dodge(0.8))+
scale_fill_manual(values = c("darkseagreen4", "darkseagreen3"))
fig5
Social Interaction (P3M1)
first test interaction of species and age when looking at social behav
look at model/anova results and compare with paper